Unstructured Audio Classification for Environment Recognition
نویسنده
چکیده
My thesis aims to contribute towards building autonomous agents that are able to understand their surrounding environment through the use of both audio and visual information. To capture a more complete description of a scene, the fusion of audio and visual information can be advantageous in enhancing the system’s context awareness. The goal of this work is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured environments. Unlike speech and music, which have formantic structures and harmonic structures, environmental sounds are considered unstructured since they are variably composed from different sound sources. My research will investigate challenging issues in characterizing environmental sounds such as the development of appropriate features extraction algorithm and learning techniques for modeling the dynamics of the environment. A final aspect of my research will consider the decision making of an autonomous agent based on the fusion of visual and audio information. Acoustic Environment Recognition We consider the task of recognizing environment sounds for the understanding of a scene (or context) surrounding an audio sensor. By auditory scenes, we refer to a location with different acoustic characteristics such as a coffee shop, park or quiet hallway. Consider, for example, applications in robotic navigation and obstacle detection, assistive robots, surveillance, and other mobile devicebased services. Many of these systems are dominantly vision-based. When being employed to understand unstructured environments, their robustness or utility will be lost if visual information is compromised or totally absent. Audio data could be easily acquired, in spite of challenging external conditions such as poor lighting or visual obstruction, and is relatively cheap to store and compute than visual signals. To enhance the system’s context awareness, we need to incorporate and adequately utilize such audio information. Research in general audio environment recognition has received some interest in the last few years (Ellis, 1996; Huang, J. 2002; Malkin et al., 2005; Eronen et al., 2006), but the activity is much less as compared to that for speech or music. Other applications include those in the domain of wearables and context-aware applications (Waibel et al., 2004; Ellis et al., 2004). Unstructured environment characterization is still in its infancy. Most research in environmental sounds has centered mostly on recognition of specific events or sounds (Cai et al., 2006). To date, only a few systems have been proposed to model raw environment audio without pre-extracting specific events or sounds (Eronen et al., 2006; Malkin et al., 2005). Similarly, our focus is not in analyzing and recognition of discrete sound events, but rather on characterizing the general acoustic environment types as a whole. Timeand FrequencyDomain Feature
منابع مشابه
Chapter 1 Unstructured Environmental Audio : Representation , Classification and Modeling
Unstructured audio is an important aspect in building systems that are capable of understanding their surrounding environment through the use of audio and other modalities of information, i.e. visual, sonar, global positioning, etc. Consider, for example, applications in robotic navigation, assistive robotics, and other mobile device-based services, where context aware processing is often desir...
متن کاملContent Analysis for Acoustic Environment Classification in Mobile Robots
We consider the task of recognizing and learning the environments for mobile robot using audio information. Environments are mainly characterized by different types of specific sounds. Using audio enables the system to capture a semantically richer environment, as compared to using visual information alone. The goal of this paper is to investigate suitable features and the design feasibility of...
متن کاملEnvironmental Sound Recognition With Time-Frequency Audio Features
The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typicall...
متن کاملText Classification Using Symbolic Data Analysis
In the real world, an operational text classification system is usually placed in the environment where the amount of human-annotated training documents is small in spite of thousands of classes. In this environment text classifier are probably the most appropriate methods for the practical systems rather than other complex learning models. Text classifiers are basically used for free flowing t...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کامل